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Abstract — The fast growing field of compressed sensing is 
founded on ttie fact ttiat if a signal is simple and has some 
'structure', then it can be reconstructed accurately with far 
fewer samples than its ambient dimension. Many different 
plausible structures have been explored in this field, ranging 
from sparsity to low-rankness and to finite rate of innovation. 
However, there are important abstract questions that are yet 
to be answered. For instance, what are the general abstract 
meanings of structure and simplicity? Does there exist universal 
algorithms for recovering such simple structured objects from 
fewer samples than their ambient dimension? In this paper, we 
aim to address these two questions. Using algorithmic informa- 
tion theory tools such as Kolmogorov complexity, we provide a 
unified method of describing simplicity and structure. We then 
explore the performance of an algorithm motivated by Ocams 
Razor (called MCP for minimum complexity pursuit) and show 
that it requires O(fclogn) number of samples to recover a 
signal, where k and n represent its complexity and ambient 
dimension, respectively. Finally, we discuss more general classes 
of signals and provide guarantees on the performance of MCP. 



I. Introduction 

Compressed sensing (CS) refers to a body of techniques 
that undersample high-dimensional signals, and yet recover 
them accurately by exploiting their intrinsic 'structure' [1], 
[2]. This permits more efficient sensing systems that are 
proved to be valuable in many applications including mag- 
netic resonance imaging (MRI) [3] and radar [4], to name a 
few. Some of the 'structures' that have been considered in 
the literature are as follows. 

i. Sparsity: A vector x e H" is called fc-sparse if and 
only if ||a;||o = EILi ^{x.^o} < k. Roughly speaking, 
according to compressed sensing a fc-sparse signal x 
can be recovered from d = O(fclogn) random linear 
measurements y = Ax. 

ii. Low rankness: If X G R™^" is a low rank matrix with 
rank(X) < k, then d = 0{r{m + n) log(mn)) random 
linear measurements are sufficient for recovering X 
from its measurements accurately with high probability 
[5]. 

iii. Model-based compressed sensing: [6] considers more 
structured signal models by assuming that from (J.') 
subspaces of fc-sparse signals only rrife of them may 
occur. It is then proved that 0(log(m/f )) random linear 
measurements are sufficient for the accurate recovery 
of such signals. This class is a superset of some of the 
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Other structures introduced in the literature such as the 
class of block-sparse signals [7]-[10]. 
iv. Rate of innovation: [11] defines the rate of innovation of 
a signal as its "degrees of freedom". Several important 
classes of functions such as the piecewise polynomial 
functions and sparse signals have clearly finite rate 
innovation. [11] suggests sampling schemes for several 
classes that recover the signal from 0(fc) number of 
measurements, where fc is the rate of innovation. 
The above results seem to provide pieces of a bigger 
picture. Recently, [12] introduced the class of simple 
functions and atomic norm as a framework that unifies 
some of the above observations and extends them to some 
other signal classes. However, there is still an interesting 
conceptual question that needs to be addressed, i.e., what 
is the abstract meaning of 'structure' that allows fewer 
measurements than the ambient dimension of the signal? 
Given a simple signal, which scheme recovers the signal 
from an undersampled random linear set of measurements? 

In the context of algorithmic information theory, 
Solomonoff [13] and Kolmogorov [14] suggested a 
universal notion of complexity for binary sequences, known 
as the Kolmogorov complexity. Given a binary sequence x, 
its Kolmogorov complexity K{x) is defined as the length 
of the shortest computer program that prints x. In this 
paper, we extend the concept of Kolmogorov complexity 
to the real signals. Such extensions are straightforward and 
have been explored before [15]. Based on this notion of 
complexity, called Kolmogorov complexity of real signals, 
we show that Occams razor [16], i.e., finding the 'simplest' 
solution of the linear equations, correctly recovers the signal 
with much fewer measurements than the ambient dimension 
of the signal. Roughly speaking, we prove that the number 
of linear measurements required for recovering the correct 
solution is proportional to the complexity rather than the 
ambient dimension of the signal. We postpone the accurate 
exposition of our results to Section IV. We will further 
discuss the issue of model mismatch in the signal classes 
and will prove that the approach motivated by Occams razor 
is stable with respect to such non-idealities in the system. 

Here is the organization of our paper Section II defines the 
notation used throughout the paper. Senction III defines Kol- 
mogorov complexity of a real-valued signal. Section IV out- 
lines our contribution. Section V calculates the Kolmogorov 
complexity of several classes that are popular in compressed 
sensing and clarifies the statements of our theorems on these 
classes. Section VI compares our work with other results in 



the literature. Sections VII and VIII are devoted to the proofs 
of our main theorems. 

II. Definitions 

Calligraphic letters such as A and B denote sets. For 
a set A, \A\ and A'^' denote its size and its complement, 
respectively. For a sample space O and event set C 51, 
denotes the indicator function of the event A. 

Let {0, 1}* denote the set of all finite-length binary se- 
quences, i.e., {0, 1}* = U„>i{0, 1}". For a vector x G R", 
the ip norm of x is defined as \\x\\p = I^jI^)"'^^^- Th^ 

£oc norm of x is denoted by || X 1 1 QQ — m8jX^ I •-'^t I . 

For a real number x £ [0, 1], let [x]„i denote the m- 
bit approximation of x that results from taking the first 
m bits in the binary expansion of x. In other words, if 
X = where {x)i <E {0,1} denotes the i"' 

bit in the binary expansion of x, then 



(1) 



Similarly, for a vector a;" G [0, 1]", define 

For an integer n G IN, let 

log* n = \\0g2 n]+2 log2 max([log2 n] , 1). 
III. KOLMOGOROV COMPLEXITY 

The Kolmogorov complexity of a finite-length sequence 
X with respect to a universal computer U is defined as the 
minimum length over all programs that print x and halt. ' For 
a universal computer U and any computer A, there exists 
a constant such that Ku{x) < K_a{x) + c^, for all 
strings x G {0,1}* [17]. Hence, as suggested in [17], we 
drop the subscript U, and let K{x) denote the Kolmogorov 
complexity of the binary string x. 

Similarly, the Kolmogorov complexity of an integer n G 
IN, Kin), is defined as the Kolmogorov complexity of its 
binary representation. It can be proved that 

K{n) < log* n + c, 

where c is a constant independent of n. 

For X = {xi,X2, . . . , Xn) G [0, 1]", define the Kolmogorov 

complexity of x at resolution m as 

I{l-U(^x) = K{[xi]m, [X2]m, ■ • • , [Xn]m)- (3) 

Lemma 1: For 2:2, . . . , Xn) G [0, 1]", 

K^'^"'{xi,X2,. ■ ■ ,x„) 
Irnisup < n. 

m— >oo ni 

The proof is very simple and is skipped. 

Definition 1: The signal x ~ (a;i, a;2, . . . , a;„) is called 
incompressible if and only if 

if[']'"(a;i,X2, . . . ,x„) 
lim = n. 

m— s-oo m 

'Refer to Chapter 14 of [17] for the exact definition of a universal 
computer, and more details on the definition of the Kolmogorov complexity. 



Proposition 1: Let {X,}f^i - C/[0, 1]. Then, 

lA'[-l"(Xi,X2,...X„)^n 
m 

in probability. 

Proof: If X, = Y.T=i{Xi)j2-^, where {X,)j G 

{0,1}, then {(X,)j}°^i - Bcrn(l/2). Theorem 14.5.3 in 
[17] states that the normalized Kolmogorov's complexity 

of . . . , [XnU) = {((^,)l, {Xi)2. . . . , (^»)m)}?=l, 

i.e.. 



A-({(X,)l,(X,)2,...,(^^)m}Ll|mr^) 



in probability. On the other hand. 



^ 1, 



(4) 



K{{{X,)^,{X,)2,...,{X,)raYU\mn) 

<if({(X,)i,(X,)2,...,(^.)™}r=l) 

< if ({(X,)i, {Xi)2, (X,)™}r=i|mn) + log*(mn) + c, 

(5) 

where c is a constant [17]. Hence, combing (4) and (5) proves 
the desired result. ■ 

IV. Our CONTRIBUTION 

Consider the problem of reconstructing a vector Xo G M" 
from d random linear measurements y = Ax with d < n. 
We say a recovery algorithm is successful if as n grows the 
£2 -error between Xo and its reconstruction Xo goes to zero, 
i.e., we want 



P{\K 



0, 



for any e > 0. Assuming that the signal is 'structured' in the 
sense that will be clarified later, we follow Ocam's Razor 
and seek the simplest solution of y = Ax, i.e.. 



argmin if''" (xi, . . . , a;„) 
s.t. Ax" = yl\ 



(6) 



We call this algorithm minimum complexity pursuit or MCP. 
The choice of m will be clarified later as well. Suppose that 
A G R'^^", where are iid JV{0, l/d), and assume that 
y" = Ax". Let x" = x" (j/", A) denote the output of (6) to 
the inputs y" and A. 

Theorem 1: Assume that Xo = (xoa, a;o,2, • ■ •) G [0,1]°° 
is such that 



K^'^'^{Xo,l,Xo,2i ■ ■ ■ ,Xo,n) ^ 

limsup ■ ■ — < K, 



(7) 



where m = m„ = [log n] . Let d = dn = \k log n] . Then, 
for any e > 



P{\K-x:\\l>e)^0, 



(8) 



as n grows without bound. 

This theorem indicates that when the Kolmogorov com- 
plexity of the signal is less than k, then O(Klogn) linear 
measurements are sufficient for the successful recovery. Also, 
it provides an evidence for the success of Ocam's Razor. 



Although Theorem 1 is an asymptotic theorem, its proof 
provides information on the performance of MCP on finite 
length sequences as well. 

Corollary 1: Assume that Xo — ixo^i,Xo,2T ■ ■ ,Xo,n) G 
[0, 1]" is such that 



< K, V m. 



Let m — m„ = \a log n] and d = dn — \2aK log n] . Then, 
with probability 1 — n^"" 

\\x:^x:h<^ — . 

y/K log n 

Now consider the following more general setting, where 
the original signal x" to be recovered is not low-complexity, 
but is close to a low-complexity signal J", i.e., — 2;"||2 < 
e„ with e„ = o(l). Again, let j/" = Aa;", and consider the 
following reconstruction algorithm for finding x" from its 
linear measurements y": 

min K^'^'"{xi, . . . ,Xn) 

S.t. \\Ax" - y'^\\2 < (7„iax{A)en. 

Assume that A e IR''^" and A^-j are iid 7V(0, ^). Let x'^ = 

Theorem 2: Assume that there exists i" such that ||a;" — 

iolb < £«, and 



iim sup < K„ 



(9) 



Let 777 = 777„ = [log 77] and d = dn — [k„ log tt,] . If e„ = 
o{dn/n), then for each e > 0, 



7,n 1 1 2 



> e W 0, 



(10) 



as 77, grows without bound. 

In the next section we show that several popular classes of 
sequences studied in CS such as class of sparse signals and 
samples of piecewise smooth functions can be considered as 
special cases of the framework we introduced in this section 
and that Theorems 1 and 2 provide useful information about 
them. 

V. Applications 

It is well-known that the Kolmogorov complexity is not 
computable. In fact, the only way to find the shortest program 
that generates a sequence is to run all the short programs 
and see if they generate the sequence or not. However, 
some short programs may not halt and there is no way to 
figure out if the program will halt or not. Hence, there is 
no effective way to calculate the Kolmogorov complexity. 
However, it is usually possible to find upper bounds for 
the Kolmogorov complexity. In this section, we consider 
several popular examples and provide upper bounds for their 
Kolmogorov complexity. Based on these upper bounds we 
use Theorems 1 and 2 to calculate the number of random 
linear measurements required by the MCP to recover these 
functions. This demonstrates the connection between the 
results of Section IV and the compressed sensing and finite 



rate of innovation frameworks explained in Section I. It is 
straightforward to extend the results to the other classes we 
discussed in Section I. 

A. Sparsity 

Let the signal Xo = (2^0,1, 2^0,2, a;o,n) be fc-sparse. 
Consider the following program for describing [a;"]m- First, 
use a program of constant length to describe the structure 
of the signal as 'sparse' and the ordering of the rest of 
information. Then, spend log* 77 + c bits to describe the 
length of the signal. Next, code the sparsity level k with 
log* k bits, and spend fc(log* 77 + c) more bits to code the 
locations of the k non-zero elements. Finally, use km more 
bits to describe the quantized magnitudes of the non-zero 
coefficients. Therefore, we have 



^ (fc + l)(l0g*77 + c)+l0g*fc + C ^^^^ 
~ 777 

Plugging (11) into Theorem 1, we conclude that \{2k + 
1) log 77] measurements are sufficient for the recovery of the 
fc-sparse signals. 

B. Piecewise polynomial 

Let {xo,i,Xo.2t ■ ■ ,Xo,n) be samples of a piecewise 
polynomial function f{x) defined on [0, 1] at locations 
(0, 1/77, . . . , (77 — ^)/n). Further, assume that < f{x) < 1, 
for every x. Let Poly^ represent the class of such functions 
which have at most Q singularities^ and N is the maximum 
degree of each polynomial. Let {af}^g denote the set of 
coefficients of the polynomial, where Ng < N denotes 
its degree. For the notational simplicity, we assume that the 
coefficients of each polynomial belong to the [0, 1] interval 



and that ^ 



=0 



a,- 



< 1 for every £, where a] is the i 



th 



coefficient of the polynomial. For a given length 77, 
we derive an upper bound on the Kolmogorov complexity. 
Consider the following program for describing [a::"]™. The 
code first specifies the model as 'piecewise polynomial' 
with parameters {n, Q, N). This requires log* 77 + log* N + 
log* fc+ci bits. Then, for each singularity point, the code first 
determines the largest sampUng point i jn that is smaller than 
it. Since there are at most Q singularity points, describing 
this information requires at most Q(log*7^ + C2) bits. The 
next step is to describe the coefficients of each polynomial. 
Using an 777'-bit quantizer for each coefficient, the induced 
error is bounded by 



N, 



1=0 



i=0 

< iN + 1)2-"''. 



(12) 



To ensure that we are able to reconstruct the 777-bit resolution 
of the samples from this description, {N + 1)2^™ < 2~"\ 
Therefore, describing the polynomials' coefficients we need 

singularity is a point at which the function is not infinitely differen- 

tiable. 



(Q + l)(iV+l)(TO+[log2(iV+l)]) extra bits. Hence, overall, 
we conclude that 

X[-l"-(Xo4,Xo,2,...,Xo,„) ^ ^ 

m 

^ (Q + l)(jV + l)riog2(A^ + l)1 
m 

_^ log* n + log* TV + log* k + Q log* n + ci + C2 ^^^^ 

TO 

It is straightforward to plug (13) into Theorem 2 and prove 
that, roughly speaking, for large values of n, {QN + 2Q + 
l)logn measurements are sufficient for the successful re- 
covery of the piecewise polynomial functions. 

So far we have considered examples of low-complexity 
signals. However, in many applications the signals are not 
of low complexity but are rather close to low complexity 
signals. We present several examples here. 

C. Ip-constrained signals 

While sparse signals have played an important role in the 
theory of compressed sensing, it is well-known that they 
do not occur in practice very often. More accurate models 
assume that either the magnitude of the signal follows a 
specific decay or the signal belongs to an Ip ball with p < 1, 
i-e., \\xo\\p < 1 [1], [18]. For the signal Xo G M" with 
||a^o||p < 1, let (a;o_(i), a;(2), ■ • ■ ,a;o.(„)) denote the permuted 
version of Xo such that Xo(i) > 2^0,(2) > ■•• > x^^j^^y 
It is easy to show that a;o,(i;) < i^p- Therefore, if we 
just keep the k largest coefficients of this signal and set 
the rest to zero the resulting fc-sparse vector Xo satisfies, 
||a:o — Soli < k~p~^^. Setting the sparsity k to n^/^. Theorem 
2 proves that c?„ = n^/^ log n samples are sufficient for 
asymptotically accurate recovery. It is interesting to note that 
as p decreases, the decay rate increases and the number of 
measurements required for the successful recovery decreases. 

D. Smooth functions 

Suppose that xi,X2, ■ ■ ■ ,Xn are equispaced samples of a 
smooth function / : [0, 1] ^ IR with < f{x) < 1. Let the 
function be /3 + 1 times differentiable and |l/*-^~''^^|loo < 7- 
For the notational simplicity we assume that |/(™)( x) < 
1 for every to < f3 + 1. This function is not necessarily 
a low-complexity signal, but it can be well approximated 
with a piecewise polynomial function. To show this, consider 
partitioning the [0,1] interval into subintervals of size r„, and 
approximating the function / with a polynomial of degree (3 
in each subinterval. Let fi3{x) denote the resulting piecewise 
polynomial function. It is easy to prove that \\f — fp\\oo < 
7r^+^. Hence, if x and Xo denote vectors consisting of the 
equispaced samples of the original signal and its piecewise 
polynomial approximation, respectively, it follows that \\x — 
Xoh < lV^rP+\ 

On the other hand the complexity of the piecewise poly- 
nomial signal is essentially proportional to /?/r„. Setting 
r„ = , Theorem 2 proves that dn = 0{n-^^^ \ogn) is 
enough for the accurate recovery of the samples of such 
signals. Clearly, for /3 < 1, this bound indicates that the 



number of samples we need is at the same order as the 
ambient dimension. However, as /3 increases fewer number 
of samples are required. 

Similar results hold for the piecewise smooth functions, 
which are very popular in image and signal processing. 

VI. Related work 

Our work is inspired by [19] and [20]. [19] considers the 
well studied problem of estimation, where the goal is to 
recover a vector 6 from its noisy observations s = 9 + z, 
where z represents the noise in the system. It then sug- 
gests using the minimum Kolmogorov complexity estimation 
(MKCE) approach and proves that if 6i tt, under several 
scenarios for the signal and noise, the average marginal 
distribution of the estimate of MKCE tends to the actual 
posterior distribution. On the other hand, [20] considers 
the problem of compressed sensing over binary sequences. 
Consider the set of all the binary sequences with Kolmogorov 
complexity less than or equal to /co, i-e., 

5(fco) = {x : A'(x) < fco}. 

Let A denote a dxn binary matrix, Xq = (a:i, X2, ■ ■ ■ , Xn)'^ , 
Yo = ^Xq. Consider the following algorithm for reconstruct- 
ing signal Xo from its linear measurements yo'. 

x(yo, ^4) ^ argmin A'(x). (14) 
yo=Ax 

[20] considers this scheme and proves that 2fc random Unear 
binary measurements are sufficient for recovering the binary 
sequences in S{ko) with, high probability. This result does 
not provide any information on the successful recovery of 
real signals and it does not consider the non-idealities in the 
signals either. Our paper settles both questions. 

As mentioned in Section I the problem we discuss in 
this paper is a central problem in the field of compressed 
sensing [1], [2]. Several papers have considered different 
generaUzation of sparsity [5], [6], [11], [12]. As mentioned 
before, all these models can be considered as subclasses of 
the general model we consider here. However, it is worth 
noting that even though the recovery approach proposed in 
our paper is universal, since Kolmogorov complexity is not 
computable, it is not useful for practical purposes. 

In this paper, we considered deterministic models for 
the signals. Similar extensions have been considered in the 
random settings as well. For instance, [21] considers the 
problem of recovering a memoryless process from a linear 
set of measurements and proves the connection between the 
number of measurements required and the Renyi entropy. 
Also, our work is in the same spirit with the minimum 
entropy decoder proposed by Csiszar in [22]. He suggests 
a universal minimum entropy decoder, for reconstructing an 
iid signal from its linear measurements at a rate determined 
by the entropy of the source. 

VII. Proof of Theorem 1 

The following Lemma will be used in the proof of the 
main theorem. 



Lemma 2 (Chi-square concentration): Fix r > and x S 
R". Assume that \\x\\l = 1 . Let = Ej^i^u^jj' i = 
1, 2, . . . , d. We then have, 

p(^^Zf-l< -rj < cl(-+i°g(i-)). (15) 

Proof: Note that {Z'jjf^i are iid 7V(0, 1 /d). By Markov 
inequality, for any A > 0, we have 

P (E^? - K -t^ = P (^-^Zf + 1 > 



e 



-At+A 



E[e 



= e-^^+^ ( 1 



2A 



-d/2 



We optimize over A to obtain 



A* = 



dr 



2(1 -r)- 



(16) 



(17) 



If we plug (17) into (16) we obtain (15). ■ 
Proof: [Proof of Theorem 1] Let ej^ = - [x^],„ 
and ej^ — x" — [Xg]m denote the quantization errors of 
the original and the reconstructed signals, respectively. Since 
both Ax'2 = Vo and Ax" = yo, it follows that 



and 



m m/ 



(18) 



On the other hand, since |j/ — [2/]m| < 2 ™, for each y € 
[0, ll, we have 



!C-Cll2<'^2 



-2m+l 



Hence, 



\m<]rn-[x:u)h = \\A{e-^-e:M2 



< a„,ax(A)Vn2-2™+i. (19) 

Since, by assumption, (7) holds for Xo, for each 6 > 0, 
there exists A^^, such that for any n > Ns, 



< K + 5 



(20) 



Since is the solution of (6), 

X[-I™(x;')<if[-I'"(x^). (21) 

Moreover, 

KiKU - [i:U) < K^-^-ix:) + + C, (22) 

where C is a constant independent of all the other variables 
in the problem [17]. Combining (20), (21) and (22) yields 



K{[x:U-[x:U)<2{K + S)m + C. 



(23) 



If for each sequence y" with K^'^"'{y^) < 2{k + 5)m + C, 
Il^[y"]m|l2 > '''11 [y"]m||2, for somc fixcd r > 0, then from 
(19) 

11*^0 ~ II2 ~ II [^'o]m + e„ — [2^0 ]m ~ ^„||2 

< WKU - K %||2 + ||C - dl2 

< Vniax(A)Vn2-2™+l + \/n2-2'"+l 

< (r- V,„ax(^) + l)Vn2-2™+i. (24) 
Define the events and £2^^ as 

$ y"; < + S) + C, \\Ay^2 < r||y"||2}, 



and 



'^2 ^ — ^dxn '■ 0'max{A) — 1 



n 



(25) 



(26) 



for some t > 0. 

Using these definitions plus the union bound, it follows 
that 

p (||x^ - x:h > e) ^P{\K - x^h > n 

+p{\\x:^x:\\2>e,{4-^n£t^r 

+ p n4">)^) 
<pf||x;'-x:i|2>6,f(")n£i") 



p(f}"''^) +p(£:^"^^^). (27) 



If ^ e £["^ n £^''\ then from (19) 



K - lb < ( + 1 + t) + 1 ) Vn2-2™+i. 



(28) 



Since, by assumption, m = m„ = [logn] and d = dn 
\k log 7i] , if 71 large enough. 



T-i( J- + 1 + + 1 ) Vn2-2™+i < e. 



Hence, for ti large enough 

P{\\x:^x:h>e,4''^n£t^) =0. 



(29) 



(30) 



On the other hand, by Lemma 2, for each sequence x" € 

R", 

P{Px"||^ < r||x"||^} = P{||A^-^||^ < r'} 

11^ l|2 

<g|(l-r^+21ogr)^ (31) 

Therefore, 

P = 

P {3 y" : A-[-l"(y") < + 5)m + C, Py"||^ < r|ly"||2} 

22(K+5)m+Cg-f (l-T^+21ogT)_ p2) 



If we set T ~ 0.04 and d = [Klogri] it is simple to see 
that this probability goes to zero. Finally, we can use the 
concentration of Lipschitz function of a Gaussian random 
vector to prove [23] 

P (f (")'^) = P {<J,^,aM) - 1 - > ^) 

< e""^*'/^. (33) 

Setting t to a constant and d ~ [Klogn] proves that this 
probability also goes to zero. ■ 

VIII. Proof of Theorem 2 

^ Let xZ^ = [a;?]™ + e^,, x^, = [5^]™ + e7„, and = 

[2^0] "I "I" ^m- 

Note that since - y'^h = WM^o " a;")!^ < 

crmax{A)en, i" is also a feasible solution. Therefore, since 
i" and i;" are both feasible, by triangle inequality, 

II A^;' - Ax^h = - y"o - {Ax: - y:)h 

< 2a,„ax(A)e„. (34) 
Again, by triangle inequality, 

WAi^-Ax^h 

= \\A{[i2U + i"J^A{[x:U + eZ)h 

> \\A{[x:U - K]™)||2 - \\A{[i-U - [e"U)h 

> \\A{[i:U - KU)\\2 - a,„a.(A)||[e"]„, - [e"]™||2 

> \\A{[i:U - KUh - '^ma.(A)Vn2-2™+i. (35) 

Combining (34) and (35), it follows 

- K]m)||2 < ff™a.(A)Vn2-2m+i + 2a^,^{A)e, 

(36) 

Since both i" and are feasible, and x" is the optimizer 
of (9), we have 

<i^[-l'"(i:)<mK, + 5), (37) 

and therefore 

- FJ) < m2{Kn + S)+C\ (38) 

where C is a constant independent of m and n. 

Consider defining the events £1 and £2 as done in (25) 
and (26), in the proof of Theorem 1 . Then, using the same 
argument used in that proof, 

p {\\x: - x:h > < p {\\x: ~x:h> e,£["^ n 4-^) 

+ P(f}"^") +P(4")'^) . (39) 

However, our choice of parameters guarantees that for 
large enough n, P(||x^ - x^Ha > €,£f^ n f^"^) = 0, and 
moreover, P(£^"'''''^) and P(£^"''''^) both go to as n grows 
to infinity. 



IX. Conclusion 

In this paper, we consider the problem of recovering 
structured signals from their linear measurements. We use the 
Komogorov complexity of the quantized signal as a universal 
measure of complexity that covers many different examples 
explored in compressed sensing literature and related areas. 
We then show that, if we consider low-complexity signals, 
the minimum complexity pursuit scheme inspired by the 
Occam's razor recovers the simplest solution of a set of 
random linear measurements. In fact, we prove that the 
number of measurements required is proportional to the 
complexity and logarithmically to the ambient dimension of 
the signal. We also consider more practical scenarios where 
the signal is not 'simple' but is 'close' to a low complexity 
signal. We show that even in such cases following minimum 
complexity pursuit algorithm provides a good estimate of the 
signal from much fewer samples than the ambient dimension 
of the signal. 

As mentioned in the paper, Kolmogorov complexity of 
a sequence is not computable. However, currently we are 
working on deriving implementable schemes by replacing 
Kolmogorov complexity by computable measures such as 
miminimum description length [24]. 
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